Diffusion Models

Diffusion models are a class of generative models that learn data distributions by iteratively adding and removing noise from data. They have gained prominence for their ability to generate high-quality samples in domains like image and audio synthesis.

Overview

Generative Modeling: Diffusion models aim to model the underlying data distribution $p (\mathbf{x})$ by learning to reverse a predefined noising process.
Noising Process: A forward process where noise is gradually added to data, leading to a tractable distribution.
Denoising Process: A reverse process where the model learns to remove noise step by step to recover the original data.

Forward Diffusion Process

The forward process adds Gaussian noise to the data over $T$ timesteps.

Markov Chain: Each noised sample depends only on the previous timestep.
Gaussian Transitions: $q(\mathbf{x}_t \mid \mathbf{x}_{t-1}) = \mathcal{N}(\mathbf{x}_t; \sqrt{1 - \beta_t} \, \mathbf{x}_{t-1}, \beta_t \mathbf{I})$
Variances: $\beta_t$ are small positive constants controlling the noise schedule.

Reverse Diffusion Process

The model learns the reverse transitions to denoise the data.

Learned Approximation: $p_\theta(\mathbf{x}_{t-1} \mid \mathbf{x}_t) = \mathcal{N}(\mathbf{x}_{t-1}; \boldsymbol{\mu}_\theta(\mathbf{x}_t, t), \sigma_t^2 \mathbf{I})$
Mean Prediction: The model predicts the mean $\boldsymbol{\mu}_\theta$ to reverse the diffusion.

Training Objective

The objective is to minimize the variational bound on the negative log-likelihood.

Simplified Loss Function: $L = \mathbb{E}_{t, \mathbf{x}_0, \boldsymbol{\epsilon}} \left[ \left\| \boldsymbol{\epsilon} - \boldsymbol{\epsilon}_\theta(\mathbf{x}_t, t) \right\|^2 \right]$
Noise Prediction: The model $\boldsymbol{\epsilon}_\theta$ predicts the added noise at each timestep.

Denoising Diffusion Probabilistic Models (DDPM)

DDPMs are a specific implementation of diffusion models with a focus on probabilistic formulation.

Forward Process: Adds noise according to a predefined schedule.
Reverse Process: Learns to denoise using neural networks, typically U-Nets.
Sampling: Starts from pure noise $\mathbf{x}_T$ and iteratively denoises to obtain $\mathbf{x}_0$ .

Sampling Procedure

To generate new data:

Initialization: Start with a noise sample $\mathbf{x}_T \sim \mathcal{N}(0, \mathbf{I})$ .
Iterative Denoising: For $t = T$ $t = T$ down to $1$ $1$ :
- Predict $\mathbf{x}_{t-1}$ using the learned reverse process.
Output: The final sample $\mathbf{x}_0$ is the generated data.

Applications

Image Generation

High-Fidelity Images: Capable of generating images with fine details.
Unconditional and Conditional Generation: Can generate images from scratch or based on input data.

Text-to-Image Synthesis

Guided Diffusion: Incorporates text embeddings to guide image generation.
Semantic Consistency: Produces images that align closely with textual descriptions.

Audio Generation

Speech Synthesis: Generates realistic speech patterns.
Music Generation: Creates novel musical compositions.

Code Example

Implementing a basic diffusion model step in PyTorch:

import torch
import torch.nn as nn

# Define noise schedule
beta_t = torch.linspace(1e-4, 0.02, T)

# Forward diffusion (adding noise)
def q_sample(x_0, t, noise):
    sqrt_alpha_cumprod = torch.sqrt(torch.cumprod(1 - beta_t, dim=0))
    return sqrt_alpha_cumprod[t] * x_0 + torch.sqrt(1 - sqrt_alpha_cumprod[t]**2) * noise

# Model (simplified)
class DiffusionModel(nn.Module):
    def __init__(self):
        super(DiffusionModel, self).__init__()
        # Define network layers
        self.net = nn.Sequential(
            nn.Linear(input_dim, hidden_dim),
            nn.ReLU(),
            nn.Linear(hidden_dim, input_dim),
        )
    
    def forward(self, x_t, t):
        return self.net(x_t)

# Training loop snippet
model = DiffusionModel()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-4)

for epoch in range(num_epochs):
    for x_0 in data_loader:
        t = torch.randint(0, T, (batch_size,))
        noise = torch.randn_like(x_0)
        x_t = q_sample(x_0, t, noise)
        noise_pred = model(x_t, t)
        loss = nn.MSELoss()(noise_pred, noise)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

Key Takeaways

Diffusion Models provide a powerful framework for generative modeling by learning to reverse a noising process.
Flexibility: They can be applied to various data types, including images, audio, and more.
State-of-the-Art Results: Achieve competitive performance in generative tasks compared to GANs and VAEs.

Overview​

Forward Diffusion Process​

Reverse Diffusion Process​

Training Objective​

Denoising Diffusion Probabilistic Models (DDPM)​

Sampling Procedure​

Applications​

Image Generation​

Text-to-Image Synthesis​

Audio Generation​

Code Example​

Key Takeaways​